Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-38645026

RESUMEN

Identification of bacterial protein-protein interactions and predicting the structures of the complexes could aid in the understanding of pathogenicity mechanisms and developing treatments for infectious diseases. Here, we developed a deep learning-based pipeline that leverages residue-residue coevolution and protein structure prediction to systematically identify and structurally characterize protein-protein interactions at the proteome-wide scale. Using this pipeline, we searched through 78 million pairs of proteins across 19 human bacterial pathogens and identified 1923 confidently predicted complexes involving essential genes and 256 involving virulence factors. Many of these complexes were not previously known; we experimentally tested 12 such predictions, and half of them were validated. The predicted interactions span core metabolic and virulence pathways ranging from post-transcriptional modification to acid neutralization to outer membrane machinery and should contribute to our understanding of the biology of these important pathogens and the design of drugs to combat them.

2.
Science ; 384(6693): eadl2528, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38452047

RESUMEN

Deep-learning methods have revolutionized protein structure prediction and design but are presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which combines a residue-based representation of amino acids and DNA bases with an atomic representation of all other groups to model assemblies that contain proteins, nucleic acids, small molecules, metals, and covalent modifications, given their sequences and chemical structures. By fine-tuning on denoising tasks, we developed RFdiffusion All-Atom (RFdiffusionAA), which builds protein structures around small molecules. Starting from random distributions of amino acid residues surrounding target small molecules, we designed and experimentally validated, through crystallography and binding measurements, proteins that bind the cardiac disease therapeutic digoxigenin, the enzymatic cofactor heme, and the light-harvesting molecule bilin.


Asunto(s)
Aminoácidos , Proteínas , Proteínas/química , ADN/química , Cristalografía
3.
Nat Methods ; 21(1): 117-121, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37996753

RESUMEN

Protein-RNA and protein-DNA complexes play critical roles in biology. Despite considerable recent advances in protein structure prediction, the prediction of the structures of protein-nucleic acid complexes without homology to known complexes is a largely unsolved problem. Here we extend the RoseTTAFold machine learning protein-structure-prediction approach to additionally predict nucleic acid and protein-nucleic acid complexes. We develop a single trained network, RoseTTAFoldNA, that rapidly produces three-dimensional structure models with confidence estimates for protein-DNA and protein-RNA complexes. Here we show that confident predictions have considerably higher accuracy than current state-of-the-art methods. RoseTTAFoldNA should be broadly useful for modeling the structure of naturally occurring protein-nucleic acid complexes, and for designing sequence-specific RNA and DNA-binding proteins.


Asunto(s)
Ácidos Nucleicos , ARN/química , Proteínas de Unión al ADN/química , ADN/química
4.
Nature ; 614(7949): 774-780, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36813896

RESUMEN

De novo enzyme design has sought to introduce active sites and substrate-binding pockets that are predicted to catalyse a reaction of interest into geometrically compatible native scaffolds1,2, but has been limited by a lack of suitable protein structures and the complexity of native protein sequence-structure relationships. Here we describe a deep-learning-based 'family-wide hallucination' approach that generates large numbers of idealized protein structures containing diverse pocket shapes and designed sequences that encode them. We use these scaffolds to design artificial luciferases that selectively catalyse the oxidative chemiluminescence of the synthetic luciferin substrates diphenylterazine3 and 2-deoxycoelenterazine. The designed active sites position an arginine guanidinium group adjacent to an anion that develops during the reaction in a binding pocket with high shape complementarity. For both luciferin substrates, we obtain designed luciferases with high selectivity; the most active of these is a small (13.9 kDa) and thermostable (with a melting temperature higher than 95 °C) enzyme that has a catalytic efficiency on diphenylterazine (kcat/Km = 106 M-1 s-1) comparable to that of native luciferases, but a much higher substrate specificity. The creation of highly active and specific biocatalysts from scratch with broad applications in biomedicine is a key milestone for computational enzyme design, and our approach should enable generation of a wide range of luciferases and other enzymes.


Asunto(s)
Aprendizaje Profundo , Luciferasas , Biocatálisis , Dominio Catalítico , Estabilidad de Enzimas , Calor , Luciferasas/química , Luciferasas/metabolismo , Luciferinas/metabolismo , Luminiscencia , Oxidación-Reducción , Especificidad por Sustrato
5.
bioRxiv ; 2023 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-38187589

RESUMEN

A general method for designing proteins to bind and sense any small molecule of interest would be widely useful. Due to the small number of atoms to interact with, binding to small molecules with high affinity requires highly shape complementary pockets, and transducing binding events into signals is challenging. Here we describe an integrated deep learning and energy based approach for designing high shape complementarity binders to small molecules that are poised for downstream sensing applications. We employ deep learning generated psuedocycles with repeating structural units surrounding central pockets; depending on the geometry of the structural unit and repeat number, these pockets span wide ranges of sizes and shapes. For a small molecule target of interest, we extensively sample high shape complementarity pseudocycles to generate large numbers of customized potential binding pockets; the ligand binding poses and the interacting interfaces are then optimized for high affinity binding. We computationally design binders to four diverse molecules, including for the first time polar flexible molecules such as methotrexate and thyroxine, which are expressed at high levels and have nanomolar affinities straight out of the computer. Co-crystal structures are nearly identical to the design models. Taking advantage of the modular repeating structure of pseudocycles and central location of the binding pockets, we constructed low noise nanopore sensors and chemically induced dimerization systems by splitting the binders into domains which assemble into the original pseudocycle pocket upon target molecule addition.

6.
Science ; 377(6604): 387-394, 2022 07 22.
Artículo en Inglés | MEDLINE | ID: mdl-35862514

RESUMEN

The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, "constrained hallucination," optimizes sequences such that their predicted structures contain the desired functional site. The second approach, "inpainting," starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests.


Asunto(s)
Aprendizaje Profundo , Ingeniería de Proteínas , Proteínas , Sitios de Unión , Catálisis , Unión Proteica , Ingeniería de Proteínas/métodos , Pliegue de Proteína , Estructura Secundaria de Proteína , Proteínas/química
7.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35641150

RESUMEN

Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.


Asunto(s)
Mutación Missense , Proteínas , Bases de Datos de Proteínas , Humanos , Modelos Moleculares , Mutación , Proteínas/química , Proteínas/genética
8.
Nature ; 600(7889): 547-552, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34853475

RESUMEN

There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences1-3. Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue-residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback-Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-'hallucinated' sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Secuencia de Aminoácidos , Cristalografía por Rayos X , Alucinaciones , Humanos , Conformación Proteica , Proteínas/química , Proteínas/genética
9.
Nat Protoc ; 16(12): 5634-5651, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34759384

RESUMEN

The trRosetta (transform-restrained Rosetta) server is a web-based platform for fast and accurate protein structure prediction, powered by deep learning and Rosetta. With the input of a protein's amino acid sequence, a deep neural network is first used to predict the inter-residue geometries, including distance and orientations. The predicted geometries are then transformed as restraints to guide the structure prediction on the basis of direct energy minimization, which is implemented under the framework of Rosetta. The trRosetta server distinguishes itself from other similar structure prediction servers in terms of rapid and accurate de novo structure prediction. As an illustration, trRosetta was applied to two Pfam families with unknown structures, for which the predicted de novo models were estimated to have high accuracy. Nevertheless, to take advantage of homology modeling, homologous templates are used as additional inputs to the network automatically. In general, it takes ~1 h to predict the final structure for a typical protein with ~300 amino acids, using a maximum of 10 CPU cores in parallel in our cluster system. To enable large-scale structure modeling, a downloadable package of trRosetta with open-source codes is available as well. A detailed guidance for using the package is also available in this protocol. The server and the package are available at https://yanglab.nankai.edu.cn/trRosetta/ and https://yanglab.nankai.edu.cn/trRosetta/download/ , respectively.


Asunto(s)
Aminoácidos/química , Biología Computacional/métodos , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Internet , Simulación de Dinámica Molecular , Redes Neurales de la Computación , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Termodinámica
10.
Science ; 374(6573): eabm4805, 2021 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-34762488

RESUMEN

Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take advantage of advances in proteome-wide amino acid coevolution analysis and deep-learning­based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes within the Saccharomyces cerevisiae proteome. We use a combination of RoseTTAFold and AlphaFold to screen through paired multiple sequence alignments for 8.3 million pairs of yeast proteins, identify 1505 likely to interact, and build structure models for 106 previously unidentified assemblies and 806 that have not been structurally characterized. These complexes, which have as many as five subunits, play roles in almost all key processes in eukaryotic cells and provide broad insights into biological function.


Asunto(s)
Aprendizaje Profundo , Complejos Multiproteicos/química , Complejos Multiproteicos/metabolismo , Mapeo de Interacción de Proteínas , Proteoma/química , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Aciltransferasas/química , Aciltransferasas/metabolismo , Segregación Cromosómica , Biología Computacional , Simulación por Computador , Reparación del ADN , Evolución Molecular , Recombinación Homóloga , Ligasas/química , Ligasas/metabolismo , Proteínas de la Membrana/química , Proteínas de la Membrana/metabolismo , Modelos Moleculares , Biosíntesis de Proteínas , Conformación Proteica , Mapas de Interacción de Proteínas , Proteoma/metabolismo , Ribosomas/metabolismo , Saccharomyces cerevisiae/química , Ubiquitina/química , Ubiquitina/metabolismo
11.
Proteins ; 89(12): 1824-1833, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34324224

RESUMEN

For CASP14, we developed deep learning-based methods for predicting homo-oligomeric and hetero-oligomeric contacts and used them for oligomer modeling. To build structure models, we developed an oligomer structure generation method that utilizes predicted interchain contacts to guide iterative restrained minimization from random backbone structures. We supplemented this gradient-based fold-and-dock method with template-based and ab initio docking approaches using deep learning-based subunit predictions on 29 assembly targets. These methods produced oligomer models with summed Z-scores 5.5 units higher than the next best group, with the fold-and-dock method having the best relative performance. Over the eight targets for which this method was used, the best of the five submitted models had average oligomer TM-score of 0.71 (average oligomer TM-score of the next best group: 0.64), and explicit modeling of inter-subunit interactions improved modeling of six out of 40 individual domains (ΔGDT-TS > 2.0).


Asunto(s)
Modelos Moleculares , Conformación Proteica , Proteínas , Programas Informáticos , Biología Computacional , Bases de Datos de Proteínas , Aprendizaje Profundo , Unión Proteica , Subunidades de Proteína/química , Subunidades de Proteína/metabolismo , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína
12.
Proteins ; 89(12): 1722-1733, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34331359

RESUMEN

The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Estructura Terciaria de Proteína , Proteínas , Programas Informáticos , Humanos , Metagenoma/genética , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Análisis de Secuencia de Proteína
13.
Science ; 373(6557): 871-876, 2021 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-34282049

RESUMEN

DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.


Asunto(s)
Aprendizaje Profundo , Conformación Proteica , Pliegue de Proteína , Proteínas/química , Proteínas ADAM/química , Secuencia de Aminoácidos , Simulación por Computador , Microscopía por Crioelectrón , Cristalografía por Rayos X , Bases de Datos de Proteínas , Proteínas de la Membrana/química , Modelos Moleculares , Complejos Multiproteicos/química , Redes Neurales de la Computación , Subunidades de Proteína/química , Proteínas/fisiología , Receptores Acoplados a Proteínas G/química , Esfingosina N-Aciltransferasa/química
14.
Proc Natl Acad Sci U S A ; 118(11)2021 03 16.
Artículo en Inglés | MEDLINE | ID: mdl-33712545

RESUMEN

The protein design problem is to identify an amino acid sequence that folds to a desired structure. Given Anfinsen's thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the desired structure is the lowest energy state. As this calculation involves not only all possible amino acid sequences but also, all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest-energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest-energy conformation for the designed sequence, and typically discarding a large fraction of designed sequences for which this is not the case. Here, we show that by backpropagating gradients through the transform-restrained Rosetta (trRosetta) structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures in a single calculation. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single-point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by conformational landscape optimization with the standard energy-based sequence design methodology in Rosetta and show that the former can result in energy landscapes with fewer alternative energy minima. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low-resolution trRosetta model serves to disfavor alternative states, and the high-resolution Rosetta model serves to create a deep energy minimum at the design target structure.


Asunto(s)
Redes Neurales de la Computación , Proteínas/química , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Termodinámica
15.
Sci Rep ; 11(1): 4290, 2021 02 22.
Artículo en Inglés | MEDLINE | ID: mdl-33619344

RESUMEN

Rapid generation of diagnostics is paramount to understand epidemiology and to control the spread of emerging infectious diseases such as COVID-19. Computational methods to predict serodiagnostic epitopes that are specific for the pathogen could help accelerate the development of new diagnostics. A systematic survey of 27 SARS-CoV-2 proteins was conducted to assess whether existing B-cell epitope prediction methods, combined with comprehensive mining of sequence databases and structural data, could predict whether a particular protein would be suitable for serodiagnosis. Nine of the predictions were validated with recombinant SARS-CoV-2 proteins in the ELISA format using plasma and sera from patients with SARS-CoV-2 infection, and a further 11 predictions were compared to the recent literature. Results appeared to be in agreement with 12 of the predictions, in disagreement with 3, while a further 5 were deemed inconclusive. We showed that two of our top five candidates, the N-terminal fragment of the nucleoprotein and the receptor-binding domain of the spike protein, have the highest sensitivity and specificity and signal-to-noise ratio for detecting COVID-19 sera/plasma by ELISA. Mixing the two antigens together for coating ELISA plates led to a sensitivity of 94% (N = 80 samples from persons with RT-PCR confirmed SARS-CoV-2 infection), and a specificity of 97.2% (N = 106 control samples).


Asunto(s)
COVID-19/diagnóstico , COVID-19/inmunología , Ensayo de Inmunoadsorción Enzimática/métodos , Epítopos de Linfocito B/inmunología , SARS-CoV-2/patogenicidad , Humanos , Reacción en Cadena en Tiempo Real de la Polimerasa , SARS-CoV-2/inmunología , Relación Señal-Ruido
16.
Nat Commun ; 12(1): 1340, 2021 02 26.
Artículo en Inglés | MEDLINE | ID: mdl-33637700

RESUMEN

We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Proteínas/química , Algoritmos , Caspasas/química , Modelos Biológicos , Modelos Moleculares , Conformación Proteica , Programas Informáticos
17.
IUCrJ ; 7(Pt 5): 881-892, 2020 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-32939280

RESUMEN

Cryo-electron microscopy of protein complexes often leads to moderate resolution maps (4-8 Å), with visible secondary-structure elements but poorly resolved loops, making model building challenging. In the absence of high-resolution structures of homologues, only coarse-grained structural features are typically inferred from these maps, and it is often impossible to assign specific regions of density to individual protein subunits. This paper describes a new method for overcoming these difficulties that integrates predicted residue distance distributions from a deep-learned convolutional neural network, computational protein folding using Rosetta, and automated EM-map-guided complex assembly. We apply this method to a 4.6 Šresolution cryoEM map of Fanconi Anemia core complex (FAcc), an E3 ubiquitin ligase required for DNA interstrand crosslink repair, which was previously challenging to interpret as it comprises 6557 residues, only 1897 of which are covered by homology models. In the published model built from this map, only 387 residues could be assigned to the specific subunits with confidence. By building and placing into density 42 deep-learning-guided models containing 4795 residues not included in the previously published structure, we are able to determine an almost-complete atomic model of FAcc, in which 5182 of the 6557 residues were placed. The resulting model is consistent with previously published biochemical data, and facilitates interpretation of disease-related mutational data. We anticipate that our approach will be broadly useful for cryoEM structure determination of large complexes containing many subunits for which there are no homologues of known structure.

18.
Proc Natl Acad Sci U S A ; 117(29): 17003-17010, 2020 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-32632011

RESUMEN

Rubicon is a potent negative regulator of autophagy and a potential target for autophagy-inducing therapeutics. Rubicon-mediated inhibition of autophagy requires the interaction of the C-terminal Rubicon homology (RH) domain of Rubicon with Rab7-GTP. Here we report the 2.8-Å crystal structure of the Rubicon RH domain in complex with Rab7-GTP. Our structure reveals a fold for the RH domain built around four zinc clusters. The switch regions of Rab7 insert into pockets on the surface of the RH domain in a mode that is distinct from those of other Rab-effector complexes. Rubicon residues at the dimer interface are required for Rubicon and Rab7 to colocalize in living cells. Mutation of Rubicon RH residues in the Rab7-binding site restores efficient autophagic flux in the presence of overexpressed Rubicon, validating the Rubicon RH domain as a promising therapeutic target.


Asunto(s)
Proteínas Relacionadas con la Autofagia , Autofagia/fisiología , Proteínas de Unión al GTP rab , Proteínas Relacionadas con la Autofagia/química , Proteínas Relacionadas con la Autofagia/metabolismo , Proteínas Relacionadas con la Autofagia/fisiología , Cristalografía por Rayos X , Células HeLa , Humanos , Modelos Moleculares , Unión Proteica , Dominios Proteicos/fisiología , Proteínas de Unión al GTP rab/química , Proteínas de Unión al GTP rab/metabolismo , Proteínas de Unión al GTP rab/fisiología , Proteínas de Unión a GTP rab7
19.
Proc Natl Acad Sci U S A ; 117(3): 1496-1503, 2020 01 21.
Artículo en Inglés | MEDLINE | ID: mdl-31896580

RESUMEN

The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described structure-prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo-designed proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.


Asunto(s)
Conformación Proteica , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Animales , Aprendizaje Profundo , Humanos
20.
Bioinformatics ; 36(1): 41-48, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31173061

RESUMEN

MOTIVATION: Almost all protein residue contact prediction methods rely on the availability of deep multiple sequence alignments (MSAs). However, many proteins from the poorly populated families do not have sufficient number of homologs in the conventional UniProt database. Here we aim to solve this issue by exploring the rich sequence data from the metagenome sequencing projects. RESULTS: Based on the improved MSA constructed from the metagenome sequence data, we developed MapPred, a new deep learning-based contact prediction method. MapPred consists of two component methods, DeepMSA and DeepMeta, both trained with the residual neural networks. DeepMSA was inspired by the recent method DeepCov, which was trained on 441 matrices of covariance features. By considering the symmetry of contact map, we reduced the number of matrices to 231, which makes the training more efficient in DeepMSA. Experiments show that DeepMSA outperforms DeepCov by 10-13% in precision. DeepMeta works by combining predicted contacts and other sequence profile features. Experiments on three benchmark datasets suggest that the contribution from the metagenome sequence data is significant with P-values less than 4.04E-17. MapPred is shown to be complementary and comparable the state-of-the-art methods. The success of MapPred is attributed to three factors: the deeper MSA from the metagenome sequence data, improved feature design in DeepMSA and optimized training by the residual neural networks. AVAILABILITY AND IMPLEMENTATION: http://yanglab.nankai.edu.cn/mappred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Metagenoma , Redes Neurales de la Computación , Análisis de Secuencia de Proteína , Algoritmos , Biología Computacional/métodos , Proteínas/química , Alineación de Secuencia , Análisis de Secuencia de Proteína/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...